Checking for Normality with the Royston’s Test in R

Introduction

Royston’s Test is a statistical method used to assess multivariate normality. It extends the Shapiro-Wilk test, which is designed for univariate data, to handle multivariate datasets. In R, Royston’s Test is most commonly implemented through the MVN package, although it can also be accessed via the psych package.

Output:

The function will return a test statistic and p-value for Royston’s test, indicating whether the data deviates significantly from multivariate normality. You will see:

• Royston’s test statistic

• p-value

• Decision: Whether or not the data follow a multivariate normal distribution.

Key Advantages

  • Tests multivariate normality, not just univariate normality.

  • Takes correlations between variables into account.

  • Suitable for small and moderate datasets.

  • Easy to use in R via the MVN package.

  • Provides a p-value to make decisions about rejecting or accepting the null hypothesis of normality.

Let us begin with Installing and loading the required package

install.packages("MVN")
library(MVN)

About the dataset

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Syntax

result <- mvn(data = your_data[ ], mvnTest = “royston”)

Example Using the Iris Dataset

We’ll use the numeric columns of the built-in iris dataset (excluding the species column).

Run Royston’s test

result <- mvn(data = iris[, 1:4], mvnTest = "royston")

Explanation of the code

  • iris[, 1:4]: This selects the first four columns of the iris dataset, which are numeric variables. These columns represent the measurements of the flowers in the dataset and are used for the multivariate normality test.

  • mvn():The mvn() function from the MVN package in R is designed to test the multivariate normality of data. It performs several tests, including the Royston test .

  • The mvnTest = “royston” argument specifies that you’re interested in running the Royston test for multivariate normality.

View results

print(result)
$multivariateNormality
     Test        H      p value MVN
1 Royston 50.39667 3.098229e-11  NO

$univariateNormality
              Test     Variable Statistic   p value Normality
1 Anderson-Darling Sepal.Length    0.8892  0.0225      NO    
2 Anderson-Darling Sepal.Width     0.9080  0.0202      NO    
3 Anderson-Darling Petal.Length    7.6785  <0.001      NO    
4 Anderson-Darling Petal.Width     5.1057  <0.001      NO    

$Descriptives
               n     Mean   Std.Dev Median Min Max 25th 75th       Skew
Sepal.Length 150 5.843333 0.8280661   5.80 4.3 7.9  5.1  6.4  0.3086407
Sepal.Width  150 3.057333 0.4358663   3.00 2.0 4.4  2.8  3.3  0.3126147
Petal.Length 150 3.758000 1.7652982   4.35 1.0 6.9  1.6  5.1 -0.2694109
Petal.Width  150 1.199333 0.7622377   1.30 0.1 2.5  0.3  1.8 -0.1009166
               Kurtosis
Sepal.Length -0.6058125
Sepal.Width   0.1387047
Petal.Length -1.4168574
Petal.Width  -1.3581792

Interpretation

  1. Multivariate Normality (Royston Test)
  • Test: Royston

  • H statistic: 50.39667

  • p-value: 3.098229e-11 (very small)

  • MVN: “NO”

The multivariate normality test (Royston’s test) has a very low p-value (< 0.05), which means we reject the null hypothesis and conclude that the data does not follow a multivariate normal distribution.

  1. Univariate Normality (Anderson-Darling Test) For all four variables (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), the p-values are all less than 0.05, so none of these variables follow a normal distribution based on the Anderson-Darling test.

Key Observation:

The p-value for the Royston test is very small, suggesting that the entire dataset does not follow a multivariate normal distribution.

Conclusion:

Royston’s Test provides a robust method for assessing multivariate normality, extending the widely used Shapiro-Wilk test to handle multidimensional data. It is particularly useful in multivariate statistical analyses where the assumption of normality is critical, such as MANOVA, factor analysis, and multivariate regression. While Royston’s Test is not as commonly known as some other multivariate normality tests, its availability through R packages like MVN makes it an accessible and valuable tool for researchers and analysts.